-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] Using a pure Go parser for Python #2320
base: main
Are you sure you want to change the base?
Conversation
FYI, @dougthor42, you might be using the feature of comments in the source code, would you have time to check this PR please? |
hmm, this only support Python 3.4 or below... Marking it as draft for now |
@linzhp Any update on this? TBH it kinda looks like github.com/go-python/gpython is defunct. It hasn't had any updates since Feb. I've run into another issue with the
I would be quite sad if gazelle annotations (comments in python files) were removed. That's the only way we've been able to deal with pre-existing circular imports within our code base when migrating to Bazel. |
I wonder if it would be easier to just make our own go parser for python? I've already done a bit while working on a hack for #2396. The key thing to note is that Gazelle only needs to know a very small subset of python grammar:
For the "happy path", comments and Things start to get tricky when:
For the record, here's some very-WIP pure go parser for the happy path above. This is what I'm playing with to address #2396 // file_parser.go
// A really really dumb python parser that's "good enough". Hopefully.
func (p *FileParser) naiveFileParser(code []byte) (*ParserOutput, error) {
codeStr := string(code[:])
// Parse imports
modules, err := p.naiveImportParser(codeStr)
if err != nil {
return nil, err
}
p.output.Modules = modules
// Parse comments
// TODO: find comment-in-string or comment-in-comment cases
commentPattern := regexp.MustCompile(`(?m)#.+$`)
commentStrings := commentPattern.FindAllString(codeStr, -1)
var comments []comment
for _, s := range commentStrings {
comments = append(comments, comment(s))
}
p.output.Comments = comments
// Parse 'if __name__' block
// TODO: Support other similar forms like b == a or single quotes
p.output.HasMain = strings.Contains(codeStr, `if __name__ == "__main__":`)
return &p.output, nil
}
func indexOf(element string, data []string) (uint32, error) {
for k, v := range data {
if element == v {
return uint32(k), nil
}
}
err := errors.New(fmt.Sprintf("Did not find %q in data.", element))
return 0, err
}
// naiveImportParser runs a very naive, mostly regex-based, parse of the python source code
// as a string.
// NOTE: This does not support multi-line imports like
// from foo import (
// bar,
// baz
// )
// Nor does this _really_ support comma-separated like
// from collections.abc import Callable, Iterator
func (p *FileParser) naiveImportParser(code string) ([]module, error) {
// Find `import a`, `from a import b`, `import a as b`, `from a import b as c`
// The pattern doesn't need to be bullet-proof; we'll remove things like comments later.
importPattern := regexp.MustCompile(`(?m)^(\w*import .+)|(\w*from .+ import .+)( as .+)?`)
importStatements := importPattern.FindAllString(code, -1)
// Also support Windows files >:-(
lines := strings.Split(strings.ReplaceAll(code, "\r\n", "\n"), "\n")
var modules []module
for _, v := range importStatements {
lineNum, err := indexOf(strings.TrimSpace(v), lines)
if err != nil {
return nil, err
}
// Remove any comments that might be present.
statement, _, _ := strings.Cut(v, "#")
statement = strings.TrimSpace(statement)
// At this point we have something like "import foo", "import foo as bar",
// "from foo import bar", or "from foo import bar as baz"
// So we need to "do the needful" to make the dotted import 'foo.bar'
dottedImport := ""
from := ""
// We don't care about aliases. Drop them.
statement, _, _ = strings.Cut(statement, " as ")
// 'from foo import bar' is the same as 'import foo.bar'. Because we don't need
// to support 'from foo import some_attribute'.
// dougthor42 is lucky here. A "real" implementation might need to worry about this.
if strings.HasPrefix(statement, "from ") {
first, second, _ := strings.Cut(statement[5:], " import ")
// second might be a list of items "from foo import A, B, c, d"
// If that's the case, we're (probably) importing Attributes so we don't
// need any of them.
// TODO: The more correct thing to do here would be to make N module structs.
// But for now, I'm lazy.
if strings.Contains(second, ",") {
second = strings.Split(second, ",")[0]
}
// "from foo import bar" --> "foo.bar"
dottedImport = fmt.Sprintf("%s.%s", first, second)
from = first
} else {
// "import foo.bar" --> "foo.bar"
dottedImport = statement[7:]
}
mod := module{
Name: dottedImport,
LineNumber: lineNum,
Filepath: p.relFilepath,
From: from,
}
modules = append(modules, mod)
}
return modules, nil
} |
Thanks for being interested. My next attempt would be using antlr to parse Python, but I haven't got there yet. Maybe you can give it a try?
Can you use |
The readme says that grammar only targets python 3.6 🫤
Sadly no, as that would apply at the Bazel package level and thus impact all targets. We need target-level (which for us are file-level) ignores I see a couple options:
|
Replacing tree-sitting with github.com/go-python/gpython, which is in pure Go. This re-enable the ability to cross-compile Gazelle from other platforms to macOS.
Note that gpython doesn't preserve Python comments, while Python extension read Gazelle directives in Python comments (not just BUILD.bazel file comments), I have to scan the Python file for a second time to find the comments, but this doesn't include inline comments. So this will break existing usage of Gazelle directives inline Python comments, which I think should be very rare. In fact, unless I miss anything, I don't think we should allow Gazelle directives in Python comments in the first place. They are designed to be in BUILD.bazel files.
Fixes #1913
@alexeagle FYI.